PSI-BLAST pseudocounts and the minimum description length principle
نویسندگان
چکیده
Position specific score matrices (PSSMs) are derived from multiple sequence alignments to aid in the recognition of distant protein sequence relationships. The PSI-BLAST protein database search program derives the column scores of its PSSMs with the aid of pseudocounts, added to the observed amino acid counts in a multiple alignment column. In the absence of theory, the number of pseudocounts used has been a completely empirical parameter. This article argues that the minimum description length principle can motivate the choice of this parameter. Specifically, for realistic alignments, the principle supports the practice of using a number of pseudocounts essentially independent of alignment size. However, it also implies that more highly conserved columns should use fewer pseudocounts, increasing the inter-column contrast of the implied PSSMs. A new method for calculating pseudocounts that significantly improves PSI-BLAST's; retrieval accuracy is now employed by default.
منابع مشابه
Complexity Approximation Principle
We propose a new inductive principle, which we call the complexity approximation principle (CAP). This principle is a natural generalization of Rissanen’s minimum description length (MDL) principle and Wallace’s minimum message length (MML) principle and is based on the notion of predictive complexity, a recent generalization of Kolmogorov complexity. Like the MDL principle, CAP can be regarded...
متن کاملA tutorial introduction to the minimum description length principle
This tutorial provides an overview of and introduction to Rissanen’s Minimum Description Length (MDL) Principle. The first chapter provides a conceptual, entirely non-technical introduction to the subject. It serves as a basis for the technical introduction given in the second chapter, in which all the ideas of the first chapter are made mathematically precise. This tutorial will appear as the ...
متن کاملA New Minimum Description Length
The minimum description length(MDL) method is one of the pioneer methods of parametric order estimation with a wide range of applications. We investigate the definition of two-stage MDL for parametric linear model sets and exhibit some drawbacks of the theory behind the existing MDL. We introduce a new description length which is inspired by the Kolmogorov complexity principle.
متن کاملMinimum Description Length (MDL) Principle as a Possible Approach to Arc Detection
Detecting arcing faults is an important but difficult-to-solve practical problem. In this paper, we show how the Minimum Description Length (MDL) Principle can help in solving this problem. Mathematics Subject Classification: 68Q30, 93AXX
متن کاملLayered Representation of Motion Video using Robust Maximum - LikelihoodEstimation of Mixture Models and MDL
Representing and modeling the motion and spatial support of multiple objects and surfaces from motion video sequences is an important intermediate step towards dynamic image understanding. One such representation, called layered representation, has recently been proposed. Although a number of algorithms have been developed for computing these representations, there has not been a consolidated e...
متن کامل